Conversation
cc8fca8 to
33264be
Compare
| /// register. There may be multiple current definitions for a register with | ||
| /// disjunct lanemasks. | ||
| VReg2SUnitMultiMap CurrentVRegDefs; | ||
| VReg2SUnitOperIdxMultiMap CurrentVRegDefs; |
There was a problem hiding this comment.
This was asymmetric between Uses and Defs. We need the operand index of the outstanding defs to compute operand latencies.
|
|
||
| // Use TRI's regsOverlap which handles both physical and virtual registers, | ||
| // including subregisters and lane masks | ||
| return TRI->regsOverlap(SrcReg, DstReg); |
There was a problem hiding this comment.
I guess this was only needed transiently, but it looks really good.
There was a problem hiding this comment.
nice they work on RegUnits
| PostSWP->isPostPipelineCandidate(*TheBlock)) | ||
| staticallyMaterializeMultiSlotInstructions(*TheBlock, HR); | ||
| PostSWP->isPostPipelineCandidate(*TheBlock)) { | ||
| staticallyMaterializeMultiSlotInstructions(*TheBlock, HR, MaterializeAll); |
There was a problem hiding this comment.
Would have been nice to be able to skip the scheduler before postpipelining. Sadly, the scheduler sometimes makes better decisions.
| for (int T = 0; T < II; ++T) { | ||
| LaneBitmask Mask = LanesByOffset[T]; | ||
| if (Mask.any()) { | ||
| // Show a simple indicator - could be enhanced to show actual lanes |
There was a problem hiding this comment.
Indeed. Full lanemasks are bulky though.
| static cl::opt<bool> TestRegDefUseTracker( | ||
| "aie-test-regdefuse-tracker", cl::Hidden, cl::init(false), | ||
| cl::desc("[AIE] TEST MODE: Run RegDefUseTracker analysis on all loops " | ||
| "(for testing only)")); |
There was a problem hiding this comment.
This is accommodating a dump for the early stages of live range analysis.
|
|
||
| void BlockState::restorePipelining() { | ||
| // Restore to the original allocation of the virtual registers | ||
| RegTracker->restoreOriginalPhysRegs(); |
There was a problem hiding this comment.
These registers were used by the scheduler whose result we're going to use as a fallback.
7930abc to
dcc908c
Compare
| BS.FixPoint.PipelinerMode = firstPipelinerMode(); | ||
| if (BS.FixPoint.PipelinerMode != PostPipelinerMode::None) { | ||
| return SchedulingStage::Pipelining; | ||
| } |
There was a problem hiding this comment.
This looks a bit weird: we have been pipelining and are trying to restore to the first allowed pipelinermode for the next II. This should be invariant, so I don't think we can get None here. Perhaps assert.
|
|
||
| // For virtual mode, re-analyze and virtualize | ||
| if (FixPoint.PipelinerMode == PostPipelinerMode::Virtual) { | ||
| // RegTracker might not exist if we have multiple regions |
There was a problem hiding this comment.
Someone missed that we can't do physical mode either if we have more than one region.
I would hope that RegTracker is always there for a SWP candidate.
dcc908c to
133f034
Compare
|
|
||
| void RegLiveRangeTracker::computeAliasClosure(MCRegister Reg, | ||
| DenseSet<MCRegister> &Out) const { | ||
| Out.insert(Reg); |
There was a problem hiding this comment.
Could we remove this by passing /*IncludeSelf*/=true in the first for?
There was a problem hiding this comment.
Even better, we remove it in my current branch since it is suboptimal.
|
|
||
| // PostPipelinerMode determines whether the postpipeliner operates on physical | ||
| // registers or virtualizes them for better scheduling opportunities. | ||
| enum class PostPipelinerMode { None, Physical, Virtual }; |
There was a problem hiding this comment.
None is just a default value. It can also be used as a sentinel to indicate end() when iterating through pipeliner modes.
This is abstracting the live ranges to be used by PostRegAlloc
This module analyses live ranges of physical registers that can be safely reallocated in a basic block. It supplies facilities to rewrite to virtual registers and to restore the original allocation.
This module produces an EventSchedule from the instructions and their issue cycle. The event schedule contains the read and write events of the virtual registers occuring in the instructions ordered in the processor pipeline stage timeline. From the EventSchedule the modulo liveranges for a particular II can be constructed. These represent the lanes of each register that are live at a particular point.
This is a dedicated register allocator for use by the postpipeliner We compute some metrics, and run with a few different scorefunctions on those metrics to define an allocation order. We allocate in that ordeer, and fail as soon as we can't find a register that is available over the live range.
This is a strategy that prioritizes scheduling of scarce ranges. Scarce ranges are live ranges that compete for one svailable register. The live ranges are virtualized, which means we have no serializing WAR deps. However, we need to be careful not to have more than one live, which means we want to finish the range before starting a new one. We try all legal permutations of these live ranges. For the current live range, we first prioritize all its ancestors, then the instructions in the range itself. Once we are finished with the range, we simulate the WAR dependences that are necessary to keep the next ranges non-overlapping
699934a to
383d92b
Compare
This is a POC of register allocation during postpipelining.
We add
Status:
It's aggressive enough to reach II=7 on gemm-bfp16-opt0, but sadly, the code it produces is not correct. I'm trying to find out what is causing my diff failure.